14 research outputs found

    Adapting by copying. Towards a sustainable machine learning

    Get PDF
    [eng] Despite the rapid growth of machine learning in the past decades, deploying automated decision making systems in practice remains a challenge for most companies. On an average day, data scientists face substantial barriers to serving models into production. Production environments are complex ecosystems, still largely based on on-premise technology, where modifications are timely and costly. Given the rapid pace with which the machine learning environment changes these days, companies struggle to stay up-to-date with the latest software releases, the changes in regulation and the newest market trends. As a result, machine learning often fails to deliver according to expectations. And more worryingly, this can result in unwanted risks for users, for the company itself and even for the society as a whole, insofar the negative impact of these risks is perpetuated in time. In this context, adaptation is an instrument that is both necessary and crucial for ensuring a sustainable deployment of industrial machine learning. This dissertation is devoted to developing theoretical and practical tools to enable adaptation of machine learning models in company production environments. More precisely, we focus on devising mechanisms to exploit the knowledge acquired by models to train future generations that are better fit to meet the stringent demands of a changing ecosystem. We introduce copying as a mechanism to replicate the decision behaviour of a model using another that presents differential characteristics, in cases where access to both the models and their training data are restricted. We discuss the theoretical implications of this methodology and show how it can be performed and evaluated in practice. Under the conceptual framework of actionable accountability we also explore how copying can be used to ensure risk mitigation in circumstances where deployment of a machine learning solution results in a negative impact to individuals or organizations.[spa] A pesar del rápido crecimiento del aprendizaje automático en últimas décadas, la implementación de sistemas automatizados para la toma de decisiones sigue siendo un reto para muchas empresas. Los científicos de datos se enfrentan a diario a numerosas barreras a la hora de desplegar los modelos en producción. Los entornos de producción son ecosistemas complejos, mayoritariamente basados en tecnologías on- premise, donde los cambios son costosos. Es por eso que las empresas tienen serias dificultades para mantenerse al día con las últimas versiones de software, los cambios en la regulación vigente o las nuevas tendencias del mercado. Como consecuencia, el rendimiento del aprendizaje automático está a menudo muy por debajo de las expectativas. Y lo que es más preocupante, esto puede derivar en riesgos para los usuarios, para las propias empresas e incluso para la sociedad en su conjunto, en la medida en que el impacto negativo de dichos riesgos se perpetúe en el tiempo. En este contexto, la adaptación se revela como un elemento necesario e imprescindible para asegurar la sostenibilidad del desarrollo industrial del aprendizaje automático. Este trabajo está dedicado a desarrollar las herramientas teóricas y prácticas necesarias para posibilitar la adaptación de los modelos de aprendizaje automático en entornos de producción. En concreto, nos centramos en concebir mecanismos que permitan reutilizar el conocimiento adquirido por los modelos para entrenar futuras generaciones que estén mejor preparadas para satisfacer las demandas de un entorno altamente cambiante. Introducimos la idea de copiar, como un mecanismo que permite replicar el comportamiento decisorio de un modelo utilizando un segundo que presenta características diferenciales, en escenarios donde el acceso tanto a los datos como al propio modelo está restringido. Es en este contexto donde discutimos las implicaciones teóricas de esta metodología y demostramos como las copias pueden ser entrenadas y evaluadas en la práctica. Bajo el marco de la responsabilidad accionable, exploramos también cómo las copias pueden explotarse como herramienta para la mitigación de riesgos en circunstancias en que el despliegue de una solución basada en el aprendizaje automático pueda tener un impacto negativo sobre las personas o las organizaciones

    Differential Replication for Credit Scoring in Regulated Environments

    Get PDF
    Differential replication is a method to adapt existing machine learning solutions to the demands of highly regulated environments by reusing knowledge from one generation to the next. Copying is a technique that allows differential replication by projecting a given classifier onto a new hypothesis space, in circumstances where access to both the original solution and its training data is limited. The resulting model replicates the original decision behavior while displaying new features and characteristics. In this paper, we apply this approach to a use case in the context of credit scoring. We use a private residential mortgage default dataset. We show that differential replication through copying can be exploited to adapt a given solution to the changing demands of a constrained environment such as that of the financial market. In particular, we show how copying can be used to replicate the decision behavior not only of a model, but also of a full pipeline. As a result, we can ensure the decomposability of the attributes used to provide explanations for credit scoring models and reduce the time-to-market delivery of these solution

    Environmental adaptation and differential replication in machine learning

    Get PDF
    When deployed in the wild, machine learning models are usually confronted withan environment that imposes severe constraints. As this environment evolves, so do these constraints.As a result, the feasible set of solutions for the considered need is prone to change in time. We referto this problem as that of environmental adaptation. In this paper, we formalize environmentaladaptation and discuss how it differs from other problems in the literature. We propose solutionsbased on differential replication, a technique where the knowledge acquired by the deployed modelsis reused in specific ways to train more suitable future generations. We discuss different mechanismsto implement differential replications in practice, depending on the considered level of knowledge.Finally, we present seven examples where the problem of environmental adaptation can be solvedthrough differential replication in real-life applications

    Copying Machine Learning Classifiers

    Get PDF
    We study copying of machine learning classifiers, an agnostic technique to replicate the decision behavior of any classifier. We develop the theory behind the problem of copying, highlighting its properties, and propose a framework to copy the decision behavior of any classifier using no prior knowledge of its parameters or training data distribution. We validate this framework through extensive experiments using data from a series of well-known problems. To further validate this concept, we use three different use cases where desiderata such as interpretability, fairness or productivization constrains need to be addressed. Results show that copies can be exploited to enhance existing solutions and improve them adding new features and characteristics

    Risk mitigation in algorithmic accountability: The role of machine learning copies

    Get PDF
    Machine learning plays an increasingly important role in our society and economy and is already having an impact on our daily life in many different ways. From several perspectives, machine learning is seen as the new engine of productivity and economic growth. It can increase the business efficiency and improve any decision-making process, and of course, spawn the creation of new products and services by using complex machine learning algorithms. In this scenario, the lack of actionable accountability-related guidance is potentially the single most important challenge facing the machine learning community. Machine learning systems are often composed of many parts and ingredients, mixing third party components or software-as-a-service APIs, among others. In this paper we study the role of copies for risk mitigation in such machine learning systems. Formally, a copy can be regarded as an approximated projection operator of a model into a target model hypothesis set. Under the conceptual framework of actionable accountability, we explore the use of copies as a viable alternative in circumstances where models cannot be re-trained, nor enhanced by means of a wrapper. We use a real residential mortgage default dataset as a use case to illustrate the feasibility of this approach

    Towards Global Explanations for Credit Risk Scoring

    Get PDF
    In this paper we propose a method to obtain global explanations for trained black-box classifiers by sampling their decision function to learn alternative interpretable models. The envisaged approach provides a unified solution to approximate non-linear decision boundaries with simpler classifiers while retaining the original classification accuracy. We use a private residential mortgage default dataset as a use case to illustrate the feasibility of this approach to ensure the decomposability of attributes during pre-processing

    Importance of Timely Treatment Initiation in Infantile-Onset Pompe Disease, a Single-Centre Experience

    Get PDF
    Abstract Classic infantile Pompe disease (IPD) is a rare lysosomal storage disorder characterized by severe hypertrophic cardiomyopathy and profound muscle weakness. Without treatment, death occurs within the first 2 years of life. Although enzyme replacement therapy (ERT) with alglucosidase alfa has improved survival, treatment outcome is not good in many cases and is largely dependent on age at initiation. The objective of the study was (a) to analyse the different stages in the diagnosis and specific treatment initiation procedure in IPD patients, and (b) to compare clinical and biochemical outcomes depending on age at ERT initiation (<1 month of age vs. <3 months of age). Here, we show satisfactory clinical and biochemical outcomes in two IPD patients after early treatment initiation before 3 months of life with immunomodulatory therapy in the ERT-naïve setting, with a high ERT dose from the beginning. Despite the overall good evolution, the patient who initiated treatment <1 month of life presented even better outcomes than the patient who started treatment <3 months of life, with an earlier normalization of hypertrophic cardiomyopathy, along with CK normalization, highlighting the importance of early treatment initiation in this progressive disease before irreversible muscle damage has occurred.This work was partially funded by the Basque Department of Education (IT1281-19)
    corecore